Overview

Dataset statistics

Number of variables14
Number of observations131662
Missing cells137546
Missing cells (%)7.5%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory14.1 MiB
Average record size in memory112.0 B

Variable types

Categorical6
Numeric8

Alerts

Trip_ID has a high cardinality: 131662 distinct values High cardinality
Var2 is highly correlated with Var3High correlation
Var3 is highly correlated with Var2High correlation
Trip_Distance is highly correlated with Life_Style_IndexHigh correlation
Life_Style_Index is highly correlated with Trip_DistanceHigh correlation
Var2 is highly correlated with Var3High correlation
Var3 is highly correlated with Var2High correlation
Var2 is highly correlated with Var3High correlation
Var3 is highly correlated with Var2High correlation
Surge_Pricing_Type is highly correlated with Type_of_CabHigh correlation
Type_of_Cab is highly correlated with Surge_Pricing_TypeHigh correlation
Trip_Distance is highly correlated with Life_Style_IndexHigh correlation
Type_of_Cab is highly correlated with Surge_Pricing_TypeHigh correlation
Life_Style_Index is highly correlated with Trip_DistanceHigh correlation
Var2 is highly correlated with Var3High correlation
Var3 is highly correlated with Var2High correlation
Surge_Pricing_Type is highly correlated with Type_of_CabHigh correlation
Type_of_Cab has 20210 (15.3%) missing values Missing
Customer_Since_Months has 5920 (4.5%) missing values Missing
Life_Style_Index has 20193 (15.3%) missing values Missing
Confidence_Life_Style_Index has 20193 (15.3%) missing values Missing
Var1 has 71030 (53.9%) missing values Missing
Trip_ID is uniformly distributed Uniform
Trip_ID has unique values Unique
Customer_Since_Months has 10169 (7.7%) zeros Zeros
Cancellation_Last_1Month has 68687 (52.2%) zeros Zeros

Reproduction

Analysis started2022-05-24 16:48:48.387803
Analysis finished2022-05-24 16:49:21.721539
Duration33.33 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Trip_ID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct131662
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size1.0 MiB
T0005785292
 
1
T0005825895
 
1
T0005744541
 
1
T0005835649
 
1
T0005726469
 
1
Other values (131657)
131657 

Length

Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

Total characters1448282
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique131662 ?
Unique (%)100.0%

Sample

1st rowT0005689460
2nd rowT0005689461
3rd rowT0005689464
4th rowT0005689465
5th rowT0005689467

Common Values

ValueCountFrequency (%)
T00057852921
 
< 0.1%
T00058258951
 
< 0.1%
T00057445411
 
< 0.1%
T00058356491
 
< 0.1%
T00057264691
 
< 0.1%
T00058848331
 
< 0.1%
T00057345611
 
< 0.1%
T00057553091
 
< 0.1%
T00058934741
 
< 0.1%
T00057907381
 
< 0.1%
Other values (131652)131652
> 99.9%

Length

2022-05-24T18:49:21.822925image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
t00057852921
 
< 0.1%
t00058695781
 
< 0.1%
t00058383261
 
< 0.1%
t00057181721
 
< 0.1%
t00057052721
 
< 0.1%
t00057509721
 
< 0.1%
t00057506081
 
< 0.1%
t00057549121
 
< 0.1%
t00057119131
 
< 0.1%
t00058414221
 
< 0.1%
Other values (131652)131652
> 99.9%

Most occurring characters

ValueCountFrequency (%)
0464856
32.1%
5196230
13.5%
T131662
 
9.1%
8124825
 
8.6%
7124812
 
8.6%
975555
 
5.2%
671107
 
4.9%
165077
 
4.5%
464832
 
4.5%
364797
 
4.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1316620
90.9%
Uppercase Letter131662
 
9.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0464856
35.3%
5196230
14.9%
8124825
 
9.5%
7124812
 
9.5%
975555
 
5.7%
671107
 
5.4%
165077
 
4.9%
464832
 
4.9%
364797
 
4.9%
264529
 
4.9%
Uppercase Letter
ValueCountFrequency (%)
T131662
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1316620
90.9%
Latin131662
 
9.1%

Most frequent character per script

Common
ValueCountFrequency (%)
0464856
35.3%
5196230
14.9%
8124825
 
9.5%
7124812
 
9.5%
975555
 
5.7%
671107
 
5.4%
165077
 
4.9%
464832
 
4.9%
364797
 
4.9%
264529
 
4.9%
Latin
ValueCountFrequency (%)
T131662
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1448282
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0464856
32.1%
5196230
13.5%
T131662
 
9.1%
8124825
 
8.6%
7124812
 
8.6%
975555
 
5.2%
671107
 
4.9%
165077
 
4.5%
464832
 
4.5%
364797
 
4.5%

Trip_Distance
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct10326
Distinct (%)7.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.20090854
Minimum0.31
Maximum109.23
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 MiB
2022-05-24T18:49:21.992361image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.31
5-th percentile10.6
Q124.58
median38.2
Q360.73
95-th percentile93.75
Maximum109.23
Range108.92
Interquartile range (IQR)36.15

Descriptive statistics

Standard deviation25.52288172
Coefficient of variation (CV)0.5774288938
Kurtosis-0.1463664059
Mean44.20090854
Median Absolute Deviation (MAD)16.95
Skewness0.7237521954
Sum5819580.02
Variance651.4174914
MonotonicityNot monotonic
2022-05-24T18:49:22.177830image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30.1865
 
< 0.1%
30.7463
 
< 0.1%
29.5461
 
< 0.1%
31.6660
 
< 0.1%
29.5860
 
< 0.1%
31.0259
 
< 0.1%
32.2258
 
< 0.1%
30.7858
 
< 0.1%
31.0658
 
< 0.1%
32.158
 
< 0.1%
Other values (10316)131062
99.5%
ValueCountFrequency (%)
0.311
< 0.1%
1.531
< 0.1%
1.541
< 0.1%
1.552
< 0.1%
1.562
< 0.1%
1.591
< 0.1%
1.62
< 0.1%
1.612
< 0.1%
1.622
< 0.1%
1.641
< 0.1%
ValueCountFrequency (%)
109.2312
< 0.1%
109.2212
< 0.1%
109.2112
< 0.1%
109.213
< 0.1%
109.1917
< 0.1%
109.1813
< 0.1%
109.1717
< 0.1%
109.1616
< 0.1%
109.1514
< 0.1%
109.1417
< 0.1%

Type_of_Cab
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct5
Distinct (%)< 0.1%
Missing20210
Missing (%)15.3%
Memory size1.0 MiB
B
31136 
C
28122 
A
21569 
D
18991 
E
11634 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters111452
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB
2nd rowB
3rd rowC
4th rowC
5th rowE

Common Values

ValueCountFrequency (%)
B31136
23.6%
C28122
21.4%
A21569
16.4%
D18991
14.4%
E11634
 
8.8%
(Missing)20210
15.3%

Length

2022-05-24T18:49:22.344911image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:49:22.507804image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
b31136
27.9%
c28122
25.2%
a21569
19.4%
d18991
17.0%
e11634
 
10.4%

Most occurring characters

ValueCountFrequency (%)
B31136
27.9%
C28122
25.2%
A21569
19.4%
D18991
17.0%
E11634
 
10.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter111452
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B31136
27.9%
C28122
25.2%
A21569
19.4%
D18991
17.0%
E11634
 
10.4%

Most occurring scripts

ValueCountFrequency (%)
Latin111452
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
B31136
27.9%
C28122
25.2%
A21569
19.4%
D18991
17.0%
E11634
 
10.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII111452
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B31136
27.9%
C28122
25.2%
A21569
19.4%
D18991
17.0%
E11634
 
10.4%

Customer_Since_Months
Real number (ℝ≥0)

MISSING
ZEROS

Distinct11
Distinct (%)< 0.1%
Missing5920
Missing (%)4.5%
Infinite0
Infinite (%)0.0%
Mean6.0166611
Minimum0
Maximum10
Zeros10169
Zeros (%)7.7%
Negative0
Negative (%)0.0%
Memory size1.0 MiB
2022-05-24T18:49:22.628155image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median6
Q310
95-th percentile10
Maximum10
Range10
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.626887096
Coefficient of variation (CV)0.6028072774
Kurtosis-1.443862014
Mean6.0166611
Median Absolute Deviation (MAD)4
Skewness-0.2469539574
Sum756547
Variance13.15431001
MonotonicityNot monotonic
2022-05-24T18:49:22.763980image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
1042680
32.4%
211621
 
8.8%
310351
 
7.9%
010169
 
7.7%
58641
 
6.6%
18297
 
6.3%
47726
 
5.9%
77407
 
5.6%
67375
 
5.6%
86328
 
4.8%
(Missing)5920
 
4.5%
ValueCountFrequency (%)
010169
7.7%
18297
6.3%
211621
8.8%
310351
7.9%
47726
5.9%
58641
6.6%
67375
5.6%
77407
5.6%
86328
4.8%
95147
3.9%
ValueCountFrequency (%)
1042680
32.4%
95147
 
3.9%
86328
 
4.8%
77407
 
5.6%
67375
 
5.6%
58641
 
6.6%
47726
 
5.9%
310351
 
7.9%
211621
 
8.8%
18297
 
6.3%

Life_Style_Index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct55978
Distinct (%)50.2%
Missing20193
Missing (%)15.3%
Infinite0
Infinite (%)0.0%
Mean2.802064
Minimum1.59638
Maximum4.87511
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 MiB
2022-05-24T18:49:22.926354image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1.59638
5-th percentile2.445944
Q12.65473
median2.79805
Q32.94678
95-th percentile3.174166
Maximum4.87511
Range3.27873
Interquartile range (IQR)0.29205

Descriptive statistics

Standard deviation0.225795783
Coefficient of variation (CV)0.08058195066
Kurtosis1.111678876
Mean2.802064
Median Absolute Deviation (MAD)0.14496
Skewness0.1939941209
Sum312343.272
Variance0.05098373561
MonotonicityNot monotonic
2022-05-24T18:49:23.112731image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.7069714
 
< 0.1%
2.7760814
 
< 0.1%
2.7846513
 
< 0.1%
2.7812712
 
< 0.1%
2.7838812
 
< 0.1%
2.6948912
 
< 0.1%
2.7744712
 
< 0.1%
2.7725612
 
< 0.1%
2.703711
 
< 0.1%
2.7765111
 
< 0.1%
Other values (55968)111346
84.6%
(Missing)20193
 
15.3%
ValueCountFrequency (%)
1.596381
< 0.1%
1.656961
< 0.1%
1.679061
< 0.1%
1.687891
< 0.1%
1.736561
< 0.1%
1.786041
< 0.1%
1.820921
< 0.1%
1.835631
< 0.1%
1.837271
< 0.1%
1.842861
< 0.1%
ValueCountFrequency (%)
4.875111
< 0.1%
4.853781
< 0.1%
4.690121
< 0.1%
4.659041
< 0.1%
4.591151
< 0.1%
4.300831
< 0.1%
4.204211
< 0.1%
4.134081
< 0.1%
4.096891
< 0.1%
4.079231
< 0.1%

Confidence_Life_Style_Index
Categorical

MISSING

Distinct3
Distinct (%)< 0.1%
Missing20193
Missing (%)15.3%
Memory size1.0 MiB
B
40355 
C
35967 
A
35147 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters111469
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowB
3rd rowB
4th rowC
5th rowB

Common Values

ValueCountFrequency (%)
B40355
30.7%
C35967
27.3%
A35147
26.7%
(Missing)20193
15.3%

Length

2022-05-24T18:49:23.277448image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:49:23.418113image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
b40355
36.2%
c35967
32.3%
a35147
31.5%

Most occurring characters

ValueCountFrequency (%)
B40355
36.2%
C35967
32.3%
A35147
31.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter111469
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B40355
36.2%
C35967
32.3%
A35147
31.5%

Most occurring scripts

ValueCountFrequency (%)
Latin111469
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
B40355
36.2%
C35967
32.3%
A35147
31.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII111469
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B40355
36.2%
C35967
32.3%
A35147
31.5%

Destination_Type
Categorical

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.0 MiB
A
77597 
B
29555 
C
 
7484
D
 
6588
E
 
2717
Other values (9)
 
7721

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters131662
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowA
2nd rowA
3rd rowE
4th rowA
5th rowA

Common Values

ValueCountFrequency (%)
A77597
58.9%
B29555
 
22.4%
C7484
 
5.7%
D6588
 
5.0%
E2717
 
2.1%
F1950
 
1.5%
G1489
 
1.1%
H1260
 
1.0%
I813
 
0.6%
J695
 
0.5%
Other values (4)1514
 
1.1%

Length

2022-05-24T18:49:23.535892image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
a77597
58.9%
b29555
 
22.4%
c7484
 
5.7%
d6588
 
5.0%
e2717
 
2.1%
f1950
 
1.5%
g1489
 
1.1%
h1260
 
1.0%
i813
 
0.6%
j695
 
0.5%
Other values (4)1514
 
1.1%

Most occurring characters

ValueCountFrequency (%)
A77597
58.9%
B29555
 
22.4%
C7484
 
5.7%
D6588
 
5.0%
E2717
 
2.1%
F1950
 
1.5%
G1489
 
1.1%
H1260
 
1.0%
I813
 
0.6%
J695
 
0.5%
Other values (4)1514
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter131662
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A77597
58.9%
B29555
 
22.4%
C7484
 
5.7%
D6588
 
5.0%
E2717
 
2.1%
F1950
 
1.5%
G1489
 
1.1%
H1260
 
1.0%
I813
 
0.6%
J695
 
0.5%
Other values (4)1514
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Latin131662
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A77597
58.9%
B29555
 
22.4%
C7484
 
5.7%
D6588
 
5.0%
E2717
 
2.1%
F1950
 
1.5%
G1489
 
1.1%
H1260
 
1.0%
I813
 
0.6%
J695
 
0.5%
Other values (4)1514
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII131662
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A77597
58.9%
B29555
 
22.4%
C7484
 
5.7%
D6588
 
5.0%
E2717
 
2.1%
F1950
 
1.5%
G1489
 
1.1%
H1260
 
1.0%
I813
 
0.6%
J695
 
0.5%
Other values (4)1514
 
1.1%

Customer_Rating
Real number (ℝ≥0)

Distinct3931
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.849457959
Minimum0.00125
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 MiB
2022-05-24T18:49:23.738312image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.00125
5-th percentile1.1575
Q12.1525
median2.895
Q33.5825
95-th percentile4.39375
Maximum5
Range4.99875
Interquartile range (IQR)1.43

Descriptive statistics

Standard deviation0.9806752996
Coefficient of variation (CV)0.3441620525
Kurtosis-0.5402144493
Mean2.849457959
Median Absolute Deviation (MAD)0.7125
Skewness-0.191130587
Sum375165.3338
Variance0.9617240432
MonotonicityNot monotonic
2022-05-24T18:49:23.938841image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2.75122
 
0.1%
3.5120
 
0.1%
3.35116
 
0.1%
2.6112
 
0.1%
3.2109
 
0.1%
3.05102
 
0.1%
2.399
 
0.1%
2.67597
 
0.1%
3.6596
 
0.1%
2.4596
 
0.1%
Other values (3921)130593
99.2%
ValueCountFrequency (%)
0.001252
< 0.1%
0.00251
 
< 0.1%
0.003753
< 0.1%
0.006251
 
< 0.1%
0.008751
 
< 0.1%
0.012
< 0.1%
0.01251
 
< 0.1%
0.013752
< 0.1%
0.0153
< 0.1%
0.016251
 
< 0.1%
ValueCountFrequency (%)
587
0.1%
4.998751
 
< 0.1%
4.99754
 
< 0.1%
4.996251
 
< 0.1%
4.9951
 
< 0.1%
4.993752
 
< 0.1%
4.991252
 
< 0.1%
4.995
 
< 0.1%
4.988751
 
< 0.1%
4.986256
 
< 0.1%

Cancellation_Last_1Month
Real number (ℝ≥0)

ZEROS

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7828378727
Minimum0
Maximum8
Zeros68687
Zeros (%)52.2%
Negative0
Negative (%)0.0%
Memory size1.0 MiB
2022-05-24T18:49:24.076950image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.037559213
Coefficient of variation (CV)1.325381984
Kurtosis2.700365474
Mean0.7828378727
Median Absolute Deviation (MAD)0
Skewness1.550869334
Sum103070
Variance1.07652912
MonotonicityNot monotonic
2022-05-24T18:49:24.210098image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
068687
52.2%
136834
28.0%
216223
 
12.3%
37142
 
5.4%
41823
 
1.4%
5668
 
0.5%
6266
 
0.2%
716
 
< 0.1%
83
 
< 0.1%
ValueCountFrequency (%)
068687
52.2%
136834
28.0%
216223
 
12.3%
37142
 
5.4%
41823
 
1.4%
5668
 
0.5%
6266
 
0.2%
716
 
< 0.1%
83
 
< 0.1%
ValueCountFrequency (%)
83
 
< 0.1%
716
 
< 0.1%
6266
 
0.2%
5668
 
0.5%
41823
 
1.4%
37142
 
5.4%
216223
 
12.3%
136834
28.0%
068687
52.2%

Var1
Real number (ℝ≥0)

MISSING

Distinct122
Distinct (%)0.2%
Missing71030
Missing (%)53.9%
Infinite0
Infinite (%)0.0%
Mean64.20269825
Minimum30
Maximum210
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 MiB
2022-05-24T18:49:24.398971image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum30
5-th percentile35
Q146
median61
Q380
95-th percentile104
Maximum210
Range180
Interquartile range (IQR)34

Descriptive statistics

Standard deviation21.82044669
Coefficient of variation (CV)0.3398680629
Kurtosis-0.7106353922
Mean64.20269825
Median Absolute Deviation (MAD)17
Skewness0.4654008052
Sum3892738
Variance476.1318936
MonotonicityNot monotonic
2022-05-24T18:49:24.580424image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
391259
 
1.0%
421221
 
0.9%
381208
 
0.9%
431190
 
0.9%
371180
 
0.9%
401163
 
0.9%
361123
 
0.9%
481123
 
0.9%
451107
 
0.8%
441087
 
0.8%
Other values (112)48971
37.2%
(Missing)71030
53.9%
ValueCountFrequency (%)
30269
 
0.2%
31411
 
0.3%
32523
0.4%
33590
0.4%
34761
0.6%
35874
0.7%
361123
0.9%
371180
0.9%
381208
0.9%
391259
1.0%
ValueCountFrequency (%)
2101
< 0.1%
2001
< 0.1%
1791
< 0.1%
1731
< 0.1%
1711
< 0.1%
1691
< 0.1%
1652
< 0.1%
1631
< 0.1%
1611
< 0.1%
1601
< 0.1%

Var2
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct58
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51.20279959
Minimum40
Maximum124
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 MiB
2022-05-24T18:49:24.975600image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum40
5-th percentile45
Q148
median50
Q354
95-th percentile60
Maximum124
Range84
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.986141501
Coefficient of variation (CV)0.09738025148
Kurtosis3.024079407
Mean51.20279959
Median Absolute Deviation (MAD)3
Skewness1.184633919
Sum6741463
Variance24.86160707
MonotonicityNot monotonic
2022-05-24T18:49:25.153213image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4912445
 
9.5%
5011981
 
9.1%
4811878
 
9.0%
5111025
 
8.4%
4710479
 
8.0%
529837
 
7.5%
468715
 
6.6%
538554
 
6.5%
546980
 
5.3%
456023
 
4.6%
Other values (48)33745
25.6%
ValueCountFrequency (%)
403
 
< 0.1%
4125
 
< 0.1%
42486
 
0.4%
431461
 
1.1%
443604
 
2.7%
456023
4.6%
468715
6.6%
4710479
8.0%
4811878
9.0%
4912445
9.5%
ValueCountFrequency (%)
1241
 
< 0.1%
1011
 
< 0.1%
981
 
< 0.1%
951
 
< 0.1%
941
 
< 0.1%
932
 
< 0.1%
921
 
< 0.1%
911
 
< 0.1%
902
 
< 0.1%
895
< 0.1%

Var3
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct96
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75.0990187
Minimum52
Maximum206
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.0 MiB
2022-05-24T18:49:25.326971image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum52
5-th percentile59
Q167
median74
Q382
95-th percentile97
Maximum206
Range154
Interquartile range (IQR)15

Descriptive statistics

Standard deviation11.57827814
Coefficient of variation (CV)0.1541734944
Kurtosis1.03442169
Mean75.0990187
Median Absolute Deviation (MAD)8
Skewness0.828976581
Sum9887687
Variance134.0565247
MonotonicityNot monotonic
2022-05-24T18:49:25.533772image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
714878
 
3.7%
704824
 
3.7%
724808
 
3.7%
734737
 
3.6%
694710
 
3.6%
684650
 
3.5%
744632
 
3.5%
674621
 
3.5%
754562
 
3.5%
664456
 
3.4%
Other values (86)84784
64.4%
ValueCountFrequency (%)
5211
 
< 0.1%
5394
 
0.1%
54392
 
0.3%
55645
 
0.5%
561030
 
0.8%
571355
1.0%
581774
1.3%
592063
1.6%
602513
1.9%
613086
2.3%
ValueCountFrequency (%)
2061
 
< 0.1%
1741
 
< 0.1%
1661
 
< 0.1%
1553
< 0.1%
1471
 
< 0.1%
1421
 
< 0.1%
1413
< 0.1%
1404
< 0.1%
1393
< 0.1%
1383
< 0.1%

Gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.0 MiB
Male
93900 
Female
37762 

Length

Max length6
Median length4
Mean length4.573620331
Min length4

Characters and Unicode

Total characters602172
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFemale
2nd rowMale
3rd rowMale
4th rowMale
5th rowMale

Common Values

ValueCountFrequency (%)
Male93900
71.3%
Female37762
28.7%

Length

2022-05-24T18:49:25.701626image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:49:25.839594image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
male93900
71.3%
female37762
28.7%

Most occurring characters

ValueCountFrequency (%)
e169424
28.1%
a131662
21.9%
l131662
21.9%
M93900
15.6%
F37762
 
6.3%
m37762
 
6.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter470510
78.1%
Uppercase Letter131662
 
21.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e169424
36.0%
a131662
28.0%
l131662
28.0%
m37762
 
8.0%
Uppercase Letter
ValueCountFrequency (%)
M93900
71.3%
F37762
28.7%

Most occurring scripts

ValueCountFrequency (%)
Latin602172
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e169424
28.1%
a131662
21.9%
l131662
21.9%
M93900
15.6%
F37762
 
6.3%
m37762
 
6.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII602172
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e169424
28.1%
a131662
21.9%
l131662
21.9%
M93900
15.6%
F37762
 
6.3%
m37762
 
6.3%

Surge_Pricing_Type
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.0 MiB
2
56728 
3
47720 
1
27214 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters131662
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row3
5th row2

Common Values

ValueCountFrequency (%)
256728
43.1%
347720
36.2%
127214
20.7%

Length

2022-05-24T18:49:25.960077image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-05-24T18:49:26.094896image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
256728
43.1%
347720
36.2%
127214
20.7%

Most occurring characters

ValueCountFrequency (%)
256728
43.1%
347720
36.2%
127214
20.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number131662
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
256728
43.1%
347720
36.2%
127214
20.7%

Most occurring scripts

ValueCountFrequency (%)
Common131662
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
256728
43.1%
347720
36.2%
127214
20.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII131662
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
256728
43.1%
347720
36.2%
127214
20.7%

Interactions

2022-05-24T18:49:18.246899image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:09.058960image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:10.458705image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:11.906709image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:13.186822image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:14.589830image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:15.839289image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:17.027078image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:18.569721image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:09.272580image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:10.621773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:12.062078image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:13.499396image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:14.751908image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:15.983180image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:17.173418image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:18.718981image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:09.445916image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:10.803573image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:12.221061image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:13.647017image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:14.912064image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:16.122285image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:17.322780image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:18.886640image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:09.636379image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:11.009949image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:12.383418image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:13.818828image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:15.074752image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:16.267610image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:17.478389image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:19.049887image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:09.795594image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:11.221955image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:12.538906image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:13.971718image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:15.224659image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:16.409005image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:17.633047image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:19.211152image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:09.945401image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:11.391271image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:12.698841image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:14.117910image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:15.380247image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:16.557001image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:17.780781image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:19.394056image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:10.108163image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:11.564236image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:12.866729image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:14.284932image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:15.540759image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:16.702185image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:17.941321image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:19.558251image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:10.269483image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:11.727367image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:13.019373image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:14.435440image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:15.687285image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:16.847130image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-05-24T18:49:18.087029image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-05-24T18:49:26.215882image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-05-24T18:49:26.516973image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-05-24T18:49:26.780544image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-05-24T18:49:27.025240image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-05-24T18:49:27.261841image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-05-24T18:49:19.891839image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-05-24T18:49:20.383363image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-05-24T18:49:21.250938image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-05-24T18:49:21.506608image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

Trip_IDTrip_DistanceType_of_CabCustomer_Since_MonthsLife_Style_IndexConfidence_Life_Style_IndexDestination_TypeCustomer_RatingCancellation_Last_1MonthVar1Var2Var3GenderSurge_Pricing_Type
0T00056894606.77B1.02.42769AA3.90500040.04660Female2
1T000568946129.47B10.02.78245BA3.45000038.05678Male2
2T000568946441.58NaN10.0NaNNaNE3.501252NaN5677Male2
3T000568946561.56C10.0NaNNaNA3.453750NaN5274Male3
4T000568946754.95C10.03.03453BA3.40250451.049102Male2
5T000568946919.06E10.0NaNNaNA2.59750172.06391Male3
6T000568947029.72E10.02.83958CB2.97500183.05075Male2
7T000568947218.44B2.02.81871BA3.582500103.04663Male2
8T0005689473106.80C3.0NaNNaNA3.146250NaN5892Male2
9T0005689474107.19D5.03.04467BA2.443751NaN5883Male3

Last rows

Trip_IDTrip_DistanceType_of_CabCustomer_Since_MonthsLife_Style_IndexConfidence_Life_Style_IndexDestination_TypeCustomer_RatingCancellation_Last_1MonthVar1Var2Var3GenderSurge_Pricing_Type
131652T000590849729.76D4.02.75826CA3.706250NaN5062Male3
131653T000590849820.42B3.02.59182CG4.797501NaN4563Male1
131654T000590850640.15E4.0NaNNaNA2.746251NaN4963Female3
131655T000590850720.18NaN10.02.69374CF4.52625148.04762Female3
131656T000590850822.90D10.02.51438AA1.47250233.05278Female3
131657T000590850911.72D1.02.74229AA3.28500061.04776Male3
131658T000590851074.81C7.02.81059CA0.445000NaN6388Male2
131659T000590851240.17C10.02.99565BA3.336250NaN4875Female2
131660T000590851346.88B4.03.04744AB4.15750147.05479Male2
131661T000590851431.96A7.02.93773AE2.638751102.05785Male1